Nonlinear growth models represent an instance of nonlinear regression models, a class of models taking the general form \[ y = \mu(x, \theta) + \epsilon, \] where \(\mu(x, \theta)\) is the mean function which depends on a possibly vector-valued parameter \(\theta\), and a possibly vector-valued predictor \(x\). The stochastic component \(\epsilon\) represents the error with mean zero and constant variance. Usually, a Gaussian distribution is also assumed for the error term.
By defining the mean function \(\mu(x, \theta)\) we may obtain several different models, all characterized by the fact that parameters \(\theta\) enter in a nonlinear way into the equation. Parameters are usually estimated by nonlinear least squares which aims at minimizing the residual sum of squares.
\[ \mu(x) = \theta_1 \exp\{\theta_2 x\} \] where \(\theta_1\) is the value at the origin (i.e. \(\mu(x=0)\)), and \(\theta_2\) represents the (constant) relative ratio of change (i.e. \(\frac{d\mu(x)}{dx }\frac{1}{\mu(x)} = \theta_2\)). Thus, the model describes an increasing (exponential growth if \(\theta_2 > 0\)) or decreasing (exponential decay if \(\theta_2 < 0\)) trend with constant relative rate.
\[ \mu(x) = \frac{\theta_1}{1+\exp\{(\theta_2 - x)/\theta_3\}} \] where \(\theta_1\) is the upper horizontal asymptote, \(\theta_2\) represents the x-value at the inflection point of the symmetric growth curve, and \(\theta_3\) represents a scale parameter (and \(1/\theta_3\) is the growth-rate parameter that controls how quickly the curve approaches the upper asymptote).
\[ \mu(x) = \theta_1 \exp\{-\theta_2 \theta_3^x\} \] where \(\theta_1\) is the horizontal asymptote, \(\theta_2\) represents the value of the function at \(x = 0\) (displacement along the x-axis), and \(\theta_3\) represents a scale parameter.
The difference between the logistic and Gompertz functions is that the latter is not symmetric around the inflection point.
\[ \mu(x) = \theta_1 (1 - \exp\{-\theta_2 x\})^{\theta_3} \] where \(\theta_1\) is the horizontal asymptote, \(\theta_2\) represents the rate of growth, and \(\theta_3\) in part determines the point of inflection on the y-axis.
Dipartimento della Protezione Civile: COVID-19 Italia - Monitoraggio della situazione http://arcg.is/C1unv
Source: https://github.com/pcm-dpc/COVID-19
url = "https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-andamento-nazionale/dpc-covid19-ita-andamento-nazionale.csv"
COVID19 <- read.csv(file = url, stringsAsFactors = FALSE)
COVID19$data <- as.Date(COVID19$data)
# DT::datatable(COVID19)Warnings
- 15/04/2020: dati Regione Friuli Venezia Giulia ricalcolati (ricalcolo di isolamento domiciliare e dimessi/guariti)
- 12/04/2020: dati P.A. Bolzano ricalcolati (ricalcolo dati guariti -110 rispetto a ieri).
- 10/04/2020: dati Regione Molise parziali (dato tamponi non aggiornato).
- 29/03/2020: dati Regione Emilia Romagna parziali (dato tampone non aggiornato).
- 26/03/2020: dati Regione Piemonte parziali (-50 deceduti - comunicazione tardiva)
- 18/03/2020: dati Regione Campania non pervenuti.
- 18/03/2020: dati Provincia di Parma non pervenuti.
- 17/03/2020: dati Provincia di Rimini non aggiornati
- 16/03/2020: dati P.A. Trento e Puglia non pervenuti.
- 11/03/2020: dati Regione Abruzzo non pervenuti.
- 10/03/2020: dati Regione Lombardia parziali.
- 07/03/2020: dati Brescia +300 esiti positivi
# create data for analysis
data = data.frame(date = COVID19$data,
y = COVID19$totale_casi,
dy = reldiff(COVID19$totale_casi))
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
##
## Formula: y ~ exponential(x, th1, th2)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 14994.832477 1800.696639 8.327 0.0000000000256 ***
## th2 0.046739 0.002496 18.723 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17800 on 55 degrees of freedom
##
## Number of iterations to convergence: 12
## Achieved convergence tolerance: 0.000006995mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
##
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 182505.2248 2401.3185 76.00 <2e-16 ***
## xmid 34.5838 0.3128 110.55 <2e-16 ***
## scal 7.6360 0.2078 36.76 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3810 on 54 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.0000004779mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
# start = list(Asym = coef(mod2)[1])
# tmp = list(y = log(log(start$Asym) - log(data$y)), x = data$x)
# b = unname(coef(lm(y ~ x, data = tmp)))
# start = c(start, c(b2 = exp(b[1]), b3 = exp(b[2])))
# mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data, start = start,
# control = nls.control(maxiter = 1000))
summary(mod3)
##
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 213901.7727687 1858.7458022 115.08 <2e-16 ***
## b2 8.9596130 0.1954790 45.83 <2e-16 ***
## b3 0.9336918 0.0008955 1042.59 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1372 on 54 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.0000007053richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2)
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss,
y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
# trace = TRUE, algorithm = "plinear",
control = nls.control(maxiter = 1000, tol = 0.1))
# algorithm is not converging...
summary(mod4)
##
## Formula: y ~ richards(x, th1, th2, th3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 224716.563052 2559.610301 87.79 <2e-16 ***
## th2 0.059056 0.001181 50.02 <2e-16 ***
## th3 6.356968 0.189742 33.50 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1347 on 54 degrees of freedom
##
## Number of iterations to convergence: 3
## Achieved convergence tolerance: 0.00739
# library(nlmrt)
# mod4 = nlxb(y ~ th1*(1 - exp(-th2*x))^th3,
# data = data, start = start, trace = TRUE)models = list("Exponential model" = mod1,
"Logistic model" = mod2,
"Gompertz model" = mod3,
"Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
df = sapply(models, function(m) attr(logLik(m), "df")),
Rsquare = sapply(models, function(m)
cor(data$y, fitted(m))^2),
AIC = sapply(models, AIC),
AICc = sapply(models, AICc),
BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)| loglik | df | Rsquare | AIC | AICc | BIC | ||
|---|---|---|---|---|---|---|---|
| Exponential model | -637.7262 | 3 | 0.9321684 | 1281.4525 | 1281.9053 | 1287.5816 | |
| Logistic model | -549.3269 | 4 | 0.9970040 | 1106.6539 | 1107.4231 | 1114.8261 | |
| Gompertz model | -491.0959 | 4 | 0.9995541 | 990.1917 | 990.9610 | 998.3640 | |
| Richards model | -490.0401 | 4 | 0.9996004 | 988.0802 | 988.8494 | 996.2524 | *** |
ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(aes(y = fitted(mod1), color = "Exponential")) +
geom_line(aes(y = fitted(mod2), color = "Logistic")) +
geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
geom_line(aes(y = fitted(mod4), color = "Richards")) +
labs(x = "", y = "Infected", color = "Model") +
scale_color_manual(values = cols) +
scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 10000),
minor_breaks = seq(0, coef(mod2)[1], by = 5000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))last_plot() +
scale_y_continuous(trans = "log10", limits = c(100,NA)) +
labs(y = "Infected (log10 scale)")df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
fit1 = predict(mod1, newdata = df),
fit2 = predict(mod2, newdata = df),
fit3 = predict(mod3, newdata = df),
fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,c("fit2", "fit3")]))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
coord_cartesian(ylim = ylim) +
labs(x = "", y = "Infected", color = "Model") +
scale_y_continuous(breaks = seq(0, max(ylim), by = 10000),
minor_breaks = seq(0, max(ylim), by = 5000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))
pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]
pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]
pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]
pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]
# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
subset(pred2, x == max(data$x)+1, select = 2:5),
subset(pred3, x == max(data$x)+1, select = 2:5),
subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
## date fit lwr upr
## 58 2020-04-21 225551 177138 274244
## 581 2020-04-21 174382 165138 182393
## 582 2020-04-21 180908 177586 184444
## 583 2020-04-21 182096 178671 185826
ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
coord_cartesian(ylim = c(0, max(ylim))) +
labs(x = "", y = "Infected", color = "Model") +
scale_y_continuous(minor_breaks = seq(0, max(ylim), by = 10000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# create data for analysis
data = data.frame(date = COVID19$data,
y = COVID19$deceduti,
dy = reldiff(COVID19$deceduti))
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
##
## Formula: y ~ exponential(x, th1, th2)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 1385.521489 185.775603 7.458 0.00000000067 ***
## th2 0.053077 0.002734 19.412 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2210 on 55 degrees of freedom
##
## Number of iterations to convergence: 13
## Achieved convergence tolerance: 0.000003294mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
##
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 24464.7472 320.8056 76.26 <2e-16 ***
## xmid 37.2904 0.2823 132.09 <2e-16 ***
## scal 7.0715 0.1815 38.97 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 452.2 on 54 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.000005596mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
# manually set starting values
# start = list(Asym = coef(mod2)[1])
# tmp = list(y = log(log(start$Asym) - log(data$y)), x = data$x)
# b = unname(coef(lm(y ~ x, data = tmp)))
# start = c(start, c(b2 = exp(b[1]), b3 = exp(b[2])))
# mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data, start = start,
# control = nls.control(maxiter = 10000))
summary(mod3)
##
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 29277.647523 246.754085 118.7 <2e-16 ***
## b2 12.600917 0.293038 43.0 <2e-16 ***
## b3 0.930523 0.000849 1096.0 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 152 on 54 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.000002976richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2)
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss,
y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
# trace = TRUE, algorithm = "port",
control = nls.control(maxiter = 1000))
summary(mod4)
##
## Formula: y ~ richards(x, th1, th2, th3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 30358.238565 315.136269 96.33 <2e-16 ***
## th2 0.064976 0.001091 59.58 <2e-16 ***
## th3 9.641781 0.289109 33.35 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 151.6 on 54 degrees of freedom
##
## Number of iterations to convergence: 6
## Achieved convergence tolerance: 0.0000003978models = list("Exponential model" = mod1,
"Logistic model" = mod2,
"Gompertz model" = mod3,
"Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
df = sapply(models, function(m) attr(logLik(m), "df")),
Rsquare = sapply(models, function(m)
cor(data$y, fitted(m))^2),
AIC = sapply(models, AIC),
AICc = sapply(models, AICc),
BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)| loglik | df | Rsquare | AIC | AICc | BIC | ||
|---|---|---|---|---|---|---|---|
| Exponential model | -518.8139 | 3 | 0.9421361 | 1043.6278 | 1044.0806 | 1049.7570 | |
| Logistic model | -427.8493 | 4 | 0.9976506 | 863.6986 | 864.4678 | 871.8708 | |
| Gompertz model | -365.6899 | 4 | 0.9996906 | 739.3798 | 740.1491 | 747.5520 | |
| Richards model | -365.5423 | 4 | 0.9997028 | 739.0845 | 739.8537 | 747.2567 | *** |
ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(aes(y = fitted(mod1), color = "Exponential")) +
geom_line(aes(y = fitted(mod2), color = "Logistic")) +
geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
geom_line(aes(y = fitted(mod4), color = "Richards")) +
labs(x = "", y = "Deceased", color = "Model") +
scale_color_manual(values = cols) +
scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 1000),
minor_breaks = seq(0, coef(mod2)[1], by = 500)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))last_plot() +
scale_y_continuous(trans = "log10", limits = c(10,NA)) +
labs(y = "Deceased (log10 scale)")df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
fit1 = predict(mod1, newdata = df),
fit2 = predict(mod2, newdata = df),
fit3 = predict(mod3, newdata = df),
fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,-(1:3)]))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
coord_cartesian(ylim = ylim) +
labs(x = "", y = "Deceased", color = "Model") +
scale_y_continuous(breaks = seq(0, max(ylim), by = 1000),
minor_breaks = seq(0, max(ylim), by = 1000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))
pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]
pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]
pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]
pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]
# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
subset(pred2, x == max(data$x)+1, select = 2:5),
subset(pred3, x == max(data$x)+1, select = 2:5),
subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
## date fit lwr upr
## 58 2020-04-21 30101 23736 36473
## 581 2020-04-21 23223 22049 24116
## 582 2020-04-21 24128 23743 24473
## 583 2020-04-21 24237 23838 24650
ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
coord_cartesian(ylim = c(0, max(ylim))) +
labs(x = "", y = "Deceased", color = "Model") +
scale_y_continuous(minor_breaks = seq(0, max(ylim), by = 1000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# create data for analysis
data = data.frame(date = COVID19$data,
y = COVID19$dimessi_guariti,
dy = reldiff(COVID19$dimessi_guariti))
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
##
## Formula: y ~ exponential(x, th1, th2)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 1242.145522 116.084479 10.70 4.67e-15 ***
## th2 0.065967 0.001851 35.64 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2094 on 55 degrees of freedom
##
## Number of iterations to convergence: 11
## Achieved convergence tolerance: 0.000009463mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
##
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 65614.5750 2245.2100 29.22 <2e-16 ***
## xmid 48.2167 0.6943 69.44 <2e-16 ***
## scal 9.1178 0.2400 37.98 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 749.7 on 54 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.000001772mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
summary(mod3)
##
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 141003.27102 7534.79085 18.71 <2e-16 ***
## b2 8.51976 0.15547 54.80 <2e-16 ***
## b3 0.96425 0.00108 892.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 390.3 on 54 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.0000003524richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2)
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss,
y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
# trace = TRUE, # algorithm = "port",
control = nls.control(maxiter = 1000))
summary(mod4)
##
## Formula: y ~ richards(x, th1, th2, th3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 294726.102196 39976.868993 7.372 0.00000000102 ***
## th2 0.018380 0.001573 11.688 < 2e-16 ***
## th3 4.168223 0.161473 25.814 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 334.5 on 54 degrees of freedom
##
## Number of iterations to convergence: 19
## Achieved convergence tolerance: 0.000002674models = list("Exponential model" = mod1,
"Logistic model" = mod2,
"Gompertz model" = mod3,
"Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
df = sapply(models, function(m) attr(logLik(m), "df")),
Rsquare = sapply(models, function(m)
cor(data$y, fitted(m))^2),
AIC = sapply(models, AIC),
AICc = sapply(models, AICc),
BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)| loglik | df | Rsquare | AIC | AICc | BIC | ||
|---|---|---|---|---|---|---|---|
| Exponential model | -515.7289 | 3 | 0.9848721 | 1037.4578 | 1037.9107 | 1043.5870 | |
| Logistic model | -456.6637 | 4 | 0.9979485 | 921.3273 | 922.0966 | 929.4995 | |
| Gompertz model | -419.4481 | 4 | 0.9993663 | 846.8962 | 847.6655 | 855.0684 | |
| Richards model | -410.6614 | 4 | 0.9995213 | 829.3229 | 830.0921 | 837.4951 | *** |
ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(aes(y = fitted(mod1), color = "Exponential")) +
geom_line(aes(y = fitted(mod2), color = "Logistic")) +
geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
geom_line(aes(y = fitted(mod4), color = "Richards")) +
labs(x = "", y = "Recovered", color = "Model") +
scale_color_manual(values = cols) +
scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 1000),
minor_breaks = seq(0, coef(mod2)[1], by = 500)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))last_plot() +
scale_y_continuous(trans = "log10", limits = c(10,NA)) +
labs(y = "Recovered (log10 scale)")df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
fit1 = predict(mod1, newdata = df),
fit2 = predict(mod2, newdata = df),
fit3 = predict(mod3, newdata = df),
fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,-(1:3)]))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
coord_cartesian(ylim = ylim) +
labs(x = "", y = "Recovered", color = "Model") +
scale_y_continuous(breaks = seq(0, max(ylim), by = 1000),
minor_breaks = seq(0, max(ylim), by = 1000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))
pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]
pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]
pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]
pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]
# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
subset(pred2, x == max(data$x)+1, select = 2:5),
subset(pred3, x == max(data$x)+1, select = 2:5),
subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
## date fit lwr upr
## 58 2020-04-21 56994 50881 63470
## 581 2020-04-21 48894 46554 50695
## 582 2020-04-21 50278 49217 51229
## 583 2020-04-21 50726 49787 51539
ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
coord_cartesian(ylim = c(0, max(ylim))) +
labs(x = "", y = "Recovered", color = "Model") +
scale_y_continuous(breaks = seq(0, max(ylim), by = 5000),
minor_breaks = seq(0, max(ylim), by = 1000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))df = data.frame(date = COVID19$data,
positives = c(NA, diff(COVID19$totale_casi)),
swabs = c(NA, diff(COVID19$tamponi)))
df$x = as.numeric(df$date) - min(as.numeric(df$date)) + 1
# df$y = df$positives/df$swabs
df$y = df$positives/c(NA, zoo::rollmean(df$swabs, 2))
df = subset(df, swabs > 50)
# DT::datatable(df[,-4], )ggplot(df, aes(x = date)) +
geom_point(aes(y = swabs, color = "swabs"), pch = 19) +
geom_line(aes(y = swabs, color = "swabs")) +
geom_point(aes(y = positives, color = "positives"), pch = 0) +
geom_line(aes(y = positives, color = "positives")) +
labs(x = "", y = "Number of cases", color = "") +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = palette()[c(2,1)]) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))ggplot(df, aes(x = date, y = y)) +
geom_smooth(method = "loess", se = TRUE, col = "black") +
geom_point(col=palette()[4]) +
geom_line(size = 0.5, col=palette()[4]) +
labs(x = "", y = "% positives among admnistered swabs (two-day rolling mean)") +
scale_y_continuous(labels = scales::percent_format(),
breaks = seq(0, 0.5, by = 0.05)) +
coord_cartesian(ylim = c(0,max(df$y, na.rm = TRUE))) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))df = data.frame(date = COVID19$data,
hospital = c(NA, diff(COVID19$totale_ospedalizzati)),
icu = c(NA, diff(COVID19$terapia_intensiva)))
df$x = as.numeric(df$date) - min(as.numeric(df$date)) + 1ggplot(df, aes(x = date, y = hospital)) +
geom_smooth(method = "loess", se = TRUE, col = "black") +
geom_point(col = "orange") +
geom_line(size = 0.5, col = "orange") +
labs(x = "", y = "Change hospitalized patients") +
coord_cartesian(ylim = range(df$hospital, na.rm = TRUE)) +
scale_y_continuous(minor_breaks = seq(min(df$hospital, na.rm = TRUE),
max(df$hospital, na.rm = TRUE),
by = 100)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))ggplot(df, aes(x = date, y = icu)) +
geom_smooth(method = "loess", se = TRUE, col = "black") +
geom_point(col = "red2") +
geom_line(size = 0.5, col = "red2") +
labs(x = "", y = "Change ICU patients") +
coord_cartesian(ylim = range(df$icu, na.rm = TRUE)) +
scale_y_continuous(minor_breaks = seq(min(df$icu, na.rm = TRUE),
max(df$icu, na.rm = TRUE),
by = 10)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))